The data for following exercises stem from the publication Grassland ecosystem recovery after soil disturbance depends on nutrient supply rate and are publicly available at Dryad. The data were obtained during the long-term field experiment Cedar Creek LTER and target the effects of human disturbances on grassland ecosystem functioning and biodiversity.
seabloom <- read.table(here("2_Modeling/Data_preparation/seabloom-2020-ele-dryad-data/cdr-e001-e002-output-data.csv"),
sep = ",", header = TRUE)
# seabloom <- read.table("~/github/Swiss_SEM/2_Modeling/Data_preparation/seabloom-2020-ele-dryad-data/cdr-e001-e002-output-data.csv",
# sep = ",", header = TRUE)
dim(seabloom)
## [1] 5040 16
str(seabloom)
## 'data.frame': 5040 obs. of 16 variables:
## $ exp : int 1 1 1 1 1 1 1 1 1 1 ...
## $ field : chr "A" "A" "A" "A" ...
## $ plot : int 1 1 1 1 1 1 1 1 1 1 ...
## $ disk : int 0 0 0 0 0 0 0 0 0 0 ...
## $ yr.plowed : int 1968 1968 1968 1968 1968 1968 1968 1968 1968 1968 ...
## $ ntrt : int 9 9 9 9 9 9 9 9 9 9 ...
## $ nadd : num 0 0 0 0 0 0 0 0 0 0 ...
## $ other.add : int 0 0 0 0 0 0 0 0 0 0 ...
## $ year : int 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 ...
## $ dur : int 1 2 3 4 5 6 7 8 9 10 ...
## $ precip.mm : num 879 858 876 994 859 ...
## $ precip.gs : num 374 551 527 503 512 ...
## $ mass.above: num 61.3 135.9 216.9 302.8 586.9 ...
## $ rich : int 13 15 14 14 16 14 8 7 9 13 ...
## $ even : num 0.463 0.156 0.204 0.14 0.269 ...
## $ ens.pie : num 6.02 2.34 2.85 1.96 4.31 ...
Exploring the data reveals 16 variables with each 5040 data points:
exp: treatments in split-plot design: 1 = disturbance (Control or Disked, 35 × 55 m plots) and 2 = nutrient addition (9 levels, 4 × 4 m plots)field: three experimental fields A, B and Cplot: 54 plots within fieldsdisk: disking treatment (0 = intact at start of experiment, 1 = disked at start of experiment)yr.plowed: last year field was plowed for agriculture (A: 1968, B: 1957 and C: 1934)ntrt: nine levels representing different combinations of nitrogen (0 to 27.2 g N year-1 added as NH4NO3) and other nutrients (20 g m−2 year−1 P205; 20 g m−2 year−1 K20; 40 g m−2 year−1 CaCO3; 30.0 g m−2 year−1 MgSO4; 18 μg m−2 year−1 CuSO4; 37.7 μg m−2 year−1 ZnSO4; 15.3 μg m−2 year−1 CoCO2; 322 μg m−2 year−1 MnCl2 and 15.1 μg m−2 year−1 NaMoO4; details see Table S1 in publication). Nutrients were applied twice per year in mid-May and mid-June.nadd: nitrogen additon rate (g/m2/yr)other.add: other nutrient treatment (0 = control, 1 = other nutrients added)year: sampling yeardur: duration of experimentprecip.mm: annual precipitation (mm)precip.gs: growing season precipitation (mm)mass.above: aboveground biomass (g/m2)rich: species richness (species/0.3 m2)even: Simpson’s evennessens.pie: effective number of species, (= probability of interspecific encounter decimal equivalent to inverse Simpson’s diversity) The pairs function yields an overview over the numerical data.
pairs(seabloom[, -c(1:3, 5:6, 8:10, 12, 16)],
lower.panel = NULL)
The disturbance treatment was replicated in the three old-fields (A, B and C) in a completely randomised block design (two treatments in each of three fields for a total of 6 35 × 55 m large plots). In April 1982, in each of the fields, one of these two 35 × 55 m areas was selected to be disturbed with a 45 cm diameter disk harrow pulled by a tractor 20 times in one direction, 20 times perpendicularly and 5 times diagonally to the first passes. Following the disking, the soil was hand raked to smooth the soil and remove any remaining vegetation, so that subsequent colonisation was solely from seeds or small rhizome fragments. Within each of the 6 large plots, the 54 small plots were arrayed in 6 × 9 grid with 1 m buffers between each plot. Aluminium flashing was buried to depth of 30 cm around each plot to prevent horizontal movement of nutrients and spreading of plants through vegetative growth.
The nutrient treatments were replicated six times in a completely randomised design in each of the 35 × 55 m plots (54 4 × 4 m small plots) yielding 324 (6 x 54) plots. The analyses focuses on two nutrient treatments:
At peak biomass (mid-July to late August), all aboveground biomass was clipped in a 3 m by 10 cm strip (0.3 m2) in each plot. Note that there were 4 years when the disturbed plots were not sampled or only sampled in a single field. The biomass was sorted into dead, previous year’s growth (litter) and live, current year’s growth (live biomass). Live biomass was sorted to species, dried to constant mass at 40°C, and weighed to the nearest 0.01 g. We estimated total aboveground biomass as the summed biomass of all non-woody species in each 0.3 m2 sample, converted to g m-2. We excluded woody biomass, because our goal was to estimate annual productivity and most of the woody biomass is from previous year’s growth. Woody plant biomass composed less than 1% of total biomass across the data set.
Shorten this: Species richness is the number of species in each 0.3 m2 sample. We quantified plant diversity as the Effective Number of Species based on the Probability of Interspecific Encounter (ENSPIE), a measure of diversity that is more robust to the effects of sampling scale and less sensitive to the presence of rare species than species richness (Jost, 2006, 2007; Chase and Knight, 2013). ENSPIE is equivalent to the Inverse Simpson’s index of diversity which is calculated as \(1 / \sum_{i=1}^{S} p_i^2\) where S is the total number of species (i.e. species richness) and pi is the proportion of the community biomass reesented by species i (Jost, 2006, 2007; Chase and Knight, 2013). Simpson’s evenness (E) satisfies the main requirements of an evenness index (Smith and Wilson, 1996). In addition, it is directly related to ENSPIE through the relationship E = ENSPIE/S (Smith and Wilson, 1996), thus we can factor diversity directly into its richness and evenness components through the relationship ENSPIE = S*E.
A metamodel summarizes the concept behind a model and links it to theory. Here, the metamodel is visualized as a directed acyclic graph (DAG) which reads as: productivity (biomass) is directly influenced by the environment (nutrients, disturbance and precipitation) on the one hand and biodiversity (richness and evenness) on the other hand. The environment influences also biodiversity and thus, also has an indirect effect on productivity via biodiversity. Describe this interaction thingy!
For simplicity, set the focus on only one year. Then, 240 observations remain.
seabloom <- seabloom[seabloom$year == 2000, ]
dim(seabloom)
## [1] 240 16
First, implement the metamodel into a linear model (LM). For this, three models are necessary: one that accounts for the direct- and two for the indirect effects.
lm.dir <- lm(mass.above ~ nadd + precip.mm + rich + even,
data = seabloom)
summary(lm.dir)
##
## Call:
## lm(formula = mass.above ~ nadd + precip.mm + rich + even, data = seabloom)
##
## Residuals:
## Min 1Q Median 3Q Max
## -218.85 -59.09 -15.35 36.31 689.30
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.4044 38.5707 1.903 0.058242 .
## nadd 6.7044 0.8345 8.034 4.5e-14 ***
## precip.mm NA NA NA NA
## rich 12.7405 3.6505 3.490 0.000576 ***
## even 36.8867 40.5034 0.911 0.363379
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 100.4 on 236 degrees of freedom
## Multiple R-squared: 0.2153, Adjusted R-squared: 0.2053
## F-statistic: 21.58 on 3 and 236 DF, p-value: 2.187e-12
Problem: the estimates for precip.mm are all NAs. Reason is the focus on a single year (i.e. the year 2000) what fixes the value for precipitation.mm to 599.7 mm (i.e. the precipitation in 2000) and as it is impossible to predict anything from a single value, NA is returned.
summary(seabloom$precip.mm)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 599.7 599.7 599.7 599.7 599.7 599.7
Thus, precipitation.mm is excluded from the model from now on.
lm.dir <- lm(mass.above ~ nadd + rich + even, data = seabloom)
summary(lm.dir)
##
## Call:
## lm(formula = mass.above ~ nadd + rich + even, data = seabloom)
##
## Residuals:
## Min 1Q Median 3Q Max
## -218.85 -59.09 -15.35 36.31 689.30
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.4044 38.5707 1.903 0.058242 .
## nadd 6.7044 0.8345 8.034 4.5e-14 ***
## rich 12.7405 3.6505 3.490 0.000576 ***
## even 36.8867 40.5034 0.911 0.363379
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 100.4 on 236 degrees of freedom
## Multiple R-squared: 0.2153, Adjusted R-squared: 0.2053
## F-statistic: 21.58 on 3 and 236 DF, p-value: 2.187e-12
To account for the indirect effects, two separate LMs are necessary: one with richness and one with evenness as response.
lm.rich <- lm(rich ~ nadd, data = seabloom)
summary(lm.rich)
##
## Call:
## lm(formula = rich ~ nadd, data = seabloom)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0318 -1.7746 -0.5349 1.2254 8.9682
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.03177 0.19477 30.969 < 2e-16 ***
## nadd -0.12856 0.01614 -7.965 6.81e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.216 on 238 degrees of freedom
## Multiple R-squared: 0.2105, Adjusted R-squared: 0.2071
## F-statistic: 63.44 on 1 and 238 DF, p-value: 6.81e-14
lm.even <- lm(even ~ nadd, data = seabloom)
summary(lm.even)
##
## Call:
## lm(formula = even ~ nadd, data = seabloom)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39021 -0.13808 -0.03255 0.11926 0.50176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.494342 0.017554 28.161 <2e-16 ***
## nadd 0.003422 0.001455 2.353 0.0195 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1998 on 238 degrees of freedom
## Multiple R-squared: 0.02273, Adjusted R-squared: 0.01862
## F-statistic: 5.534 on 1 and 238 DF, p-value: 0.01946
The direct effect model showed that in the year 2000 biomass was statistically significantly positively related to the nitrogen additon rate (nadd) and species richness (rich). Additionally, nitrogen addition rate had a statistically significantly negative influence on species richness and to a lesser extent, also on evenness.
Structural equations aspire to represent cause-effect relationships and thus, they can be used to represent scientific, causal hypotheses. A key feature of structural equation models is the ability to investigate the networks of connections among system components (James B. Grace et al. 2012).
To evaluate the SEMs, load the package lavaan. This powerful piece of software relies on the computation of covariance matrices to fit the structural equations. It comes with full support for categorical data (any mixture of binary, ordinal and continuous observed variables), can handle both latent and composite variables, performs mediation analysis and calculates the magnitude of indirect effects (Rosseel 2012, 2021).
library("lavaan")
## This is lavaan 0.6-9
## lavaan is FREE software! Please report any bugs.
Variables that are highly correlated with \(r > 0.85\) become redundant. Then, one of them can be dropped or they can be modeled summarized as a latent variable (James B. Grace 2006).
Inspecting our set of variables shows that correlations between all variables are fairly low.
round(cor(seabloom[, c(4, 7, 13:15)]), digits = 2)
## disk nadd mass.above rich even
## disk 1.00 0.00 0.05 -0.01 0.02
## nadd 0.00 1.00 0.41 -0.46 0.15
## mass.above 0.05 0.41 1.00 0.00 -0.02
## rich -0.01 -0.46 0.00 1.00 -0.59
## even 0.02 0.15 -0.02 -0.59 1.00
This toy model shall illustrate the logic of SEM. Comparability is eased as it contains the same variables as the LM before. A huge benefit of SEM is that variables can appear as both predictors and responses what allows the evaluation of direct and indirect effects in one go (as shown in the directed acyclic graph (DAG) below).
The table below summarizes the operators of lavaan syntax. For example, an arrow in a DAG is represented by a tilde.
| Formula type | Operator | Mnemonic |
|---|---|---|
| regression | ~ | is regressed on |
| latent variable | =~ | is measured by |
| (residual) (co)variance | ~~ | is correlated with |
| intercept | ~ 1 | intercept |
Exercise: translate the model in the DAG above into lavaan syntax with the help of this table.
simple <-
"mass.above ~ nadd + disk + rich + even
rich ~ nadd
even ~ nadd"
lavaan uses a \(\chi^2\) test to compare the estimated- to the observed covariance matrix to compute the goodness of fit for the SE model under the assumption that all observations are independent and all variables follow a (multivariate) normal distribution (Grace 2006). Note, that these distributional assumptions only apply to endogenous variables, whereas the distribution of exogenous variables has no bearing on assumptions.
We use a graphical method to assess the fit of the endogeneous variables to a normal distribution, the quantile-quantile plots (Q-Q plots). Hereby, the quantiles of the data are compared to those of a theoretical distribution (i.e., the normal distribution). If the data would be normally distributed, the points would match one to one and thus, align diagonally. In the Q-Q plot, this match is indicated by the black line.
The package MVN allows to plot several Q-Q plots at once and also offers several tests to multivariate normal distribution (MVN). Here, we employ the Henze-Zirkler’s test that has been recommended as a formal test of MVN (Mecklin and Mundfrom 2005).
library("MVN")
seabloom.num <- seabloom[, -c(1:10, 11, 12, 16)]
mvn(data = seabloom.num, mvnTest = "hz", univariatePlot = "qqplot")
## $multivariateNormality
## Test HZ p value MVN
## 1 Henze-Zirkler 4.527585 0 NO
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Anderson-Darling mass.above 6.5613 <0.001 NO
## 2 Anderson-Darling rich 4.8536 <0.001 NO
## 3 Anderson-Darling even 3.5399 <0.001 NO
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th
## mass.above 240 211.002563 112.5962210 179.7370000 26.966000 897.564 140.1585000
## rich 240 4.979167 2.4892201 5.0000000 1.000000 15.000 3.0000000
## even 240 0.522363 0.2016525 0.4962043 0.197217 1.000 0.3741316
## 75th Skew Kurtosis
## mass.above 265.128750 2.1849256 8.7573782
## rich 6.000000 0.9118942 0.7982559
## even 0.636105 0.7113656 -0.1130884
Alternatively, also histograms in comparison overlaid with a normal distribution allows to gain insight into the distribution:
par(mfrow = c(1, 3))
hist(seabloom$mass.above, prob = TRUE, main = "")
x <- seq(min(seabloom$mass.above), max(seabloom$mass.above), length = 400)
f <- dnorm(x, mean = mean(seabloom$mass.above), sd = sd(seabloom$mass.above))
lines(x, f, col = "red", lwd = 2)
hist(seabloom$rich, prob = TRUE, main = "")
x <- seq(min(seabloom$rich), max(seabloom$rich), length = 400)
f <- dnorm(x, mean = mean(seabloom$rich), sd = sd(seabloom$rich))
lines(x, f, col = "red", lwd = 2)
hist(seabloom$even, prob = TRUE, main = "")
x <- seq(min(seabloom$even), max(seabloom$even), length = 400)
f <- dnorm(x, mean = mean(seabloom$even), sd = sd(seabloom$even))
lines(x, f, col = "red", lwd = 2)
Exercise: would you infer that the endogenous variables meet the assumption of being (multivariate) normally distributed from the results of the the Q-Q plots, the Henze-Zirkler test and the histograms? And why (not)?
Now, let’s fit the model with lavaan’s sem function. As the data clearly deviates from a MVN, we use the MLM estimator that provides standard errors and a \(\chi^2\) test statistic robust to non-normality. Hereby, the Satorra-Bentler correction is used to correct the value of the ML-based \(\chi^2\) test statistic by an amount that reflects the degree of kurtosis (Rosseel 2012).1
fit.simple <- sem(simple, data = seabloom, estimator = "MLM")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: some observed variances are (at least) a factor 1000 times larger than
## others; use varTable(fit) to investigate
Oups, the algorithm converges with a warning. Kindly, it informs us how to fix this.
Exercise: let’s obey the software and execute the code from the hint.
varTable(fit.simple)
## name idx nobs type exo user mean var nlev lnam
## 1 mass.above 13 240 numeric 0 0 211.003 12677.909 0
## 2 rich 14 240 numeric 0 0 4.979 6.196 0
## 3 even 15 240 numeric 0 0 0.522 0.041 0
## 4 nadd 7 240 numeric 1 0 8.188 78.895 0
## 5 disk 4 240 numeric 1 0 0.400 0.241 0
This reveals an enormous difference in magnitude between the variance of biomass (mass.above) and the other variables.
To remove this difference in magnitude between the variables, we divide mass.above by 100 what changes the unit from g/m2 to 10 mg/m2. The boxplots show that the range of the variables is now more similar.
seabloom$mass.above <- seabloom$mass.above / 100
boxplot(seabloom[, c(4, 7, 13:15)])
Then, run sem again with the rescaled biomass:
fit.simple <- sem(simple, data = seabloom, estimator = "MLM")
summary(fit.simple, fit.measures = TRUE, rsq = TRUE)
## lavaan 0.6-9 ended normally after 39 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
##
## Number of observations 240
##
## Model Test User Model:
## Standard Robust
## Test Statistic 104.782 103.606
## Degrees of freedom 3 3
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.011
## Satorra-Bentler correction
##
## Model Test Baseline Model:
##
## Test statistic 226.074 259.950
## Degrees of freedom 9 9
## P-value 0.000 0.000
## Scaling correction factor 0.870
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.531 0.599
## Tucker-Lewis Index (TLI) -0.407 -0.203
##
## Robust Comparative Fit Index (CFI) 0.534
## Robust Tucker-Lewis Index (TLI) -0.399
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -822.542 -822.542
## Loglikelihood unrestricted model (H1) -770.151 -770.151
##
## Akaike (AIC) 1663.085 1663.085
## Bayesian (BIC) 1694.411 1694.411
## Sample-size adjusted Bayesian (BIC) 1665.883 1665.883
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.376 0.374
## 90 Percent confidence interval - lower 0.316 0.314
## 90 Percent confidence interval - upper 0.439 0.437
## P-value RMSEA <= 0.05 0.000 0.000
##
## Robust RMSEA 0.376
## 90 Percent confidence interval - lower 0.316
## 90 Percent confidence interval - upper 0.440
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.141 0.141
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## mass.above ~
## nadd 0.067 0.008 7.938 0.000
## disk 0.123 0.134 0.923 0.356
## rich 0.127 0.022 5.812 0.000
## even 0.361 0.338 1.067 0.286
## rich ~
## nadd -0.129 0.016 -8.284 0.000
## even ~
## nadd 0.003 0.002 2.121 0.034
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .mass.above 0.987 0.251 3.930 0.000
## .rich 4.872 0.520 9.367 0.000
## .even 0.040 0.004 11.267 0.000
##
## R-Square:
## Estimate
## mass.above 0.233
## rich 0.210
## even 0.023
This time, the model converged, however, with poor fit:
To improve the model fit, we look for missing paths via modification indices. They indicate a drop in the model \(\chi^2\) value resulting from freeing fixed parameters via including a missing path. 3.84 is considered as the critical threshold, the “single-degree-of-freedom \(\chi^2\) criterion” (James B. Grace 2021).
modindices(fit.simple, minimum.value = 3.84)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 15 rich ~~ even 84.811 -0.261 -0.261 -0.594 -0.594
## 16 rich ~ mass.above 51.301 -10.881 -10.881 -4.969 -4.969
## 17 rich ~ even 84.811 -6.596 -6.596 -0.534 -0.534
## 19 even ~ mass.above 79.659 -0.399 -0.399 -2.248 -2.248
## 20 even ~ rich 84.811 -0.054 -0.054 -0.661 -0.661
In the column mi (for modification index) we look for high values. Note, however, that the modification indices are uninformed suggestions and further adaptations of the model based on their information needs to be based on theory.
In this example, the modification indices indicate–amongst other–a missing relation between richness and evenness. Including this relation into the model would improve its fit by the change in \(\chi^2\) by 84.811. –> still, 104.782 - 84.811 = 19.971, but the updated model says 0.143
This path is necessary as richness and evenness are computationally related to each other (they are not independent quantities). Thus, let’s include a correlation between rich and even (Note: in SEM, a correlation between two variables points to an omitted common cause/variable that drives this correlation). With the function update() it is possible to directly incorporate the missing path into the specified model without rewriting it from scratch:
fit.simple.up <- update(fit.simple, add = "rich ~~ even")
summary(fit.simple.up, rsq = TRUE, fit.measures = TRUE)
## lavaan 0.6-9 ended normally after 45 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 10
##
## Number of observations 240
##
## Model Test User Model:
## Standard Robust
## Test Statistic 0.143 0.138
## Degrees of freedom 2 2
## P-value (Chi-square) 0.931 0.933
## Scaling correction factor 1.037
## Satorra-Bentler correction
##
## Model Test Baseline Model:
##
## Test statistic 226.074 259.950
## Degrees of freedom 9 9
## P-value 0.000 0.000
## Scaling correction factor 0.870
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.039 1.033
##
## Robust Comparative Fit Index (CFI) 1.000
## Robust Tucker-Lewis Index (TLI) 1.040
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -770.223 -770.223
## Loglikelihood unrestricted model (H1) -770.151 -770.151
##
## Akaike (AIC) 1560.445 1560.445
## Bayesian (BIC) 1595.252 1595.252
## Sample-size adjusted Bayesian (BIC) 1563.554 1563.554
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 0.000
## 90 Percent confidence interval - lower 0.000 0.000
## 90 Percent confidence interval - upper 0.037 0.029
## P-value RMSEA <= 0.05 0.961 0.966
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.036
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.007 0.007
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## mass.above ~
## nadd 0.067 0.008 8.561 0.000
## disk 0.123 0.134 0.923 0.356
## rich 0.127 0.035 3.622 0.000
## even 0.361 0.487 0.740 0.459
## rich ~
## nadd -0.129 0.016 -8.284 0.000
## even ~
## nadd 0.003 0.002 2.121 0.034
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .rich ~~
## .even -0.261 0.028 -9.326 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .mass.above 0.987 0.251 3.930 0.000
## .rich 4.872 0.520 9.367 0.000
## .even 0.040 0.004 11.267 0.000
##
## R-Square:
## Estimate
## mass.above 0.218
## rich 0.210
## even 0.023
# standardizedsolution(fit.simple.up)
# inspect(fit.simple.up, "r2")
modindices(fit.simple.up, minimum.value = 3)
## [1] lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## <0 rows> (or 0-length row.names)
Now, the model has a decent fit with a ratio of test statistic and degrees of freedom much smaller than two (i.e. 0.07) and a \(p\)-value of 0.93. Further, also the CFI is now 1.
Another look at the modification indices shows that further modifications would yield only smallest improvements to the model fit, so that we can ignore them confidently:
modindices(fit.simple.up, minimum.value = 0)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 12 nadd ~~ disk 0.000 0.000 0.000 NA 0.000
## 16 rich ~ mass.above 0.003 0.102 0.102 0.046 0.046
## 18 rich ~ disk 0.003 0.013 0.013 0.002 0.005
## 19 even ~ mass.above 0.111 0.057 0.057 0.318 0.318
## 21 even ~ disk 0.111 0.007 0.007 0.017 0.035
## 22 nadd ~ mass.above 0.000 0.000 0.000 0.000 0.000
## 25 nadd ~ disk 0.000 0.000 0.000 0.000 0.000
## 26 disk ~ mass.above 0.585 -0.789 -0.789 -1.809 -1.809
## 27 disk ~ rich 0.020 -0.002 -0.002 -0.008 -0.008
## 28 disk ~ even 0.133 0.056 0.056 0.023 0.023
## 29 disk ~ nadd 0.000 0.000 0.000 0.000 0.000
Estimates, the raw unstandardized coefficientsStd.err, the standard errorZ-value, the analog to \(t\)-values derived from maximum likelihood estimation (ML)P(>|z|), the probability of obtaining a \(z\) of the given value by chanceRegressions …Variances = explained variance for endogenous variables = estimates of the error variancesR-square = the \(1 - error\) variance in standardized terms(James B. Grace 2021)
[JB Grace]
The library lavaanPlot allows to simply and straight-forwardly visualize diagrams from lavaan objects. Since it was removed from CRAN begin of 2021, it needs to be downloaded and installed from the archive. The code below first tests, if lavaanPlot is missing in the local library and if so, installs it from the latest archived repository on CRAN.
Another library that handles lavaan objects would be semPlot.
suppressMessages(lavaanPlot_installed <- require(lavaanPlot))
if (!lavaanPlot_installed) {
install.packages("https://cran.r-project.org/src/contrib/Archive/lavaanPlot/lavaanPlot_0.6.0.tar.gz",
repos = NULL, type = "source")
}
Then, we can plot the results with significance levels displayed as asterisks.
library("lavaanPlot")
lavaanPlot(model = fit.simple.up,
node_options = list(shape = "box", color = "gray",
fontname = "Helvetica"),
edge_options = list(color = "black"),
coefs = TRUE, covs = TRUE, stars = c("covs", "regress"))
A saturated model includes all possible paths. As a result, there are no degrees of freedom left and it is impossible to estimate the model fit…
(James B. Grace 2021)
Adding each a path from disk to richness (rich) and evenness (even) creates a satured model from our simple one.
satur <-
"mass.above ~ nadd + disk + rich + even
rich ~ nadd + disk
even ~ nadd + disk
rich ~~ even"
fit.satur <- sem(satur, data = seabloom, estimator = "MLM")
summary(fit.satur, rsq = TRUE)
## lavaan 0.6-9 ended normally after 51 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
##
## Number of observations 240
##
## Model Test User Model:
## Standard Robust
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## mass.above ~
## nadd 0.067 0.008 8.559 0.000
## disk 0.123 0.134 0.923 0.356
## rich 0.127 0.035 3.622 0.000
## even 0.361 0.488 0.740 0.459
## rich ~
## nadd -0.129 0.016 -8.284 0.000
## disk -0.052 0.293 -0.178 0.859
## even ~
## nadd 0.003 0.002 2.121 0.034
## disk 0.010 0.027 0.366 0.714
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .rich ~~
## .even -0.261 0.028 -9.371 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .mass.above 0.987 0.251 3.930 0.000
## .rich 4.871 0.521 9.357 0.000
## .even 0.040 0.003 11.313 0.000
##
## R-Square:
## Estimate
## mass.above 0.218
## rich 0.211
## even 0.023
Exercise: What modification indices do you expect from a saturated model?
# modindices(fit.satur, minimum.value = 0)
[JBG material]
To evaluate whether the simple or the saturated model perform better globally, we can calculate the Akaike information criterion (AIC) for both. A difference in the AIC of two is considered as informative (Burnham and Anderson 2002).
anova(fit.simple.up, fit.satur)
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
##
## lavaan NOTE:
## The "Chisq" column contains standard test statistics, not the
## robust test that should be reported per model. A robust difference
## test is a function of two standard (not robust) statistics.
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## fit.satur 0 1564.3 1606.1 0.0000
## fit.simple.up 2 1560.5 1595.2 0.1428 0.13772 2 0.9335
From the ANOVA table we can see the difference in AIC = 3.8572034 is in the favor for the simple model. That said, the AIC penalizes for model complexity by including the number of parameters of a model in the computation (Burnham and Anderson 2002). Consequently, the saturated model is expected to have an higher AIC because it includes two additional paths.
More background on robust corrections to standard errors and test statistics in SEM can be found in Savalei (2014).↩︎